Search CORE

502 research outputs found

Unimodal Bandits: Regret Lower Bounds and Optimal Algorithms

Author: Combes Richard
Proutiere Alexandre
Publication venue
Publication date: 01/01/2014
Field of study

We consider stochastic multi-armed bandits where the expected reward is a unimodal function over partially ordered arms. This important class of problems has been recently investigated in (Cope 2009, Yu 2011). The set of arms is either discrete, in which case arms correspond to the vertices of a finite graph whose structure represents similarity in rewards, or continuous, in which case arms belong to a bounded interval. For discrete unimodal bandits, we derive asymptotic lower bounds for the regret achieved under any algorithm, and propose OSUB, an algorithm whose regret matches this lower bound. Our algorithm optimally exploits the unimodal structure of the problem, and surprisingly, its asymptotic regret does not depend on the number of arms. We also provide a regret upper bound for OSUB in non-stationary environments where the expected rewards smoothly evolve over time. The analytical results are supported by numerical experiments showing that OSUB performs significantly better than the state-of-the-art algorithms. For continuous sets of arms, we provide a brief discussion. We show that combining an appropriate discretization of the set of arms with the UCB algorithm yields an order-optimal regret, and in practice, outperforms recently proposed algorithms designed to exploit the unimodal structure.Comment: ICML 2014 (technical report). arXiv admin note: text overlap with arXiv:1307.730

arXiv.org e-Print Archive

HAL-CentraleSupelec

CiteSeerX

HAL-Rennes 1

Dynamic Rate and Channel Selection in Cognitive Radio Systems

Author: Combes Richard
Proutiere Alexandre
Publication venue
Publication date: 12/05/2014
Field of study

In this paper, we investigate dynamic channel and rate selection in cognitive radio systems which exploit a large number of channels free from primary users. In such systems, transmitters may rapidly change the selected (channel, rate) pair to opportunistically learn and track the pair offering the highest throughput. We formulate the problem of sequential channel and rate selection as an online optimization problem, and show its equivalence to a {\it structured} Multi-Armed Bandit problem. The structure stems from inherent properties of the achieved throughput as a function of the selected channel and rate. We derive fundamental performance limits satisfied by {\it any} channel and rate adaptation algorithm, and propose algorithms that achieve (or approach) these limits. In turn, the proposed algorithms optimally exploit the inherent structure of the throughput. We illustrate the efficiency of our algorithms using both test-bed and simulation experiments, in both stationary and non-stationary radio environments. In stationary environments, the packet successful transmission probabilities at the various channel and rate pairs do not evolve over time, whereas in non-stationary environments, they may evolve. In practical scenarios, the proposed algorithms are able to track the best channel and rate quite accurately without the need of any explicit measurement and feedback of the quality of the various channels.Comment: 19 page

arXiv.org e-Print Archive

HAL-CentraleSupelec

CiteSeerX

HAL-Rennes 1

Mixed Polling with Rerouting and Applications

Author: Combes Richard
Kavitha Veeraruna
Publication venue
Publication date: 01/01/2013
Field of study

Queueing systems with a single server in which customers wait to be served at a finite number of distinct locations (buffers/queues) are called discrete polling systems. Polling systems in which arrivals of users occur anywhere in a continuum are called continuous polling systems. Often one encounters a combination of the two systems: the users can either arrive in a continuum or wait in a finite set (i.e. wait at a finite number of queues). We call these systems mixed polling systems. Also, in some applications, customers are rerouted to a new location (for another service) after their service is completed. In this work, we study mixed polling systems with rerouting. We obtain their steady state performance by discretization using the known pseudo conservation laws of discrete polling systems. Their stationary expected workload is obtained as a limit of the stationary expected workload of a discrete system. The main tools for our analysis are: a) the fixed point analysis of infinite dimensional operators and; b) the convergence of Riemann sums to an integral. We analyze two applications using our results on mixed polling systems and discuss the optimal system design. We consider a local area network, in which a moving ferry facilitates communication (data transfer) using a wireless link. We also consider a distributed waste collection system and derive the optimal collection point. In both examples, the service requests can arrive anywhere in a subset of the two dimensional plane. Namely, some users arrive in a continuous set while others wait for their service in a finite set. The only polling systems that can model these applications are mixed systems with rerouting as introduced in this manuscript.Comment: to appear in Performance Evaluatio

arXiv.org e-Print Archive

HAL-CentraleSupelec

CiteSeerX

INRIA a CCSD electronic archive server

Dspace at IIT Bombay

HAL-Rennes 1

Lipschitz Bandits: Regret Lower Bounds and Optimal Algorithms

Author: Combes Richard
Magureanu Stefan
Proutiere Alexandre
Publication venue
Publication date: 01/01/2014
Field of study

We consider stochastic multi-armed bandit problems where the expected reward is a Lipschitz function of the arm, and where the set of arms is either discrete or continuous. For discrete Lipschitz bandits, we derive asymptotic problem specific lower bounds for the regret satisfied by any algorithm, and propose OSLB and CKL-UCB, two algorithms that efficiently exploit the Lipschitz structure of the problem. In fact, we prove that OSLB is asymptotically optimal, as its asymptotic regret matches the lower bound. The regret analysis of our algorithms relies on a new concentration inequality for weighted sums of KL divergences between the empirical distributions of rewards and their true distributions. For continuous Lipschitz bandits, we propose to first discretize the action space, and then apply OSLB or CKL-UCB, algorithms that provably exploit the structure efficiently. This approach is shown, through numerical experiments, to significantly outperform existing algorithms that directly deal with the continuous set of arms. Finally the results and algorithms are extended to contextual bandits with similarities.Comment: COLT 201

arXiv.org e-Print Archive

HAL-CentraleSupelec

Publikationer från KTH

CiteSeerX

Digitala Vetenskapliga Arkivet - Academic Archive On-line

HAL-Rennes 1

Hierarchical Beamforming: Resource Allocation, Fairness and Flow Level Performance

Author: Altman Zwi
Combes Richard
Floquet Julien
Publication venue
Publication date: 17/05/2018
Field of study

We consider hierarchical beamforming in wireless networks. For a given population of flows, we propose computationally efficient algorithms for fair rate allocation including proportional fairness and max-min fairness. We next propose closed-form formulas for flow level performance, for both elastic (with either proportional fairness and max-min fairness) and streaming traffic. We further assess the performance of hierarchical beamforming using numerical experiments. Since the proposed solutions have low complexity compared to conventional beamforming, our work suggests that hierarchical beamforming is a promising candidate for the implementation of beamforming in future cellular networks.Comment: 34 page

arXiv.org e-Print Archive

HAL-CentraleSupelec

Multipath streaming: fundamental limits and efficient algorithms

Author: Combes Richard
Elayoubi Salah-Eddine
Sidi Habib
Publication venue
Publication date: 14/06/2016
Field of study

We investigate streaming over multiple links. A file is split into small units called chunks that may be requested on the various links according to some policy, and received after some random delay. After a start-up time called pre-buffering time, received chunks are played at a fixed speed. There is starvation if the chunk to be played has not yet arrived. We provide lower bounds (fundamental limits) on the starvation probability of any policy. We further propose simple, order-optimal policies that require no feedback. For general delay distributions, we provide tractable upper bounds for the starvation probability of the proposed policies, allowing to select the pre-buffering time appropriately. We specialize our results to: (i) links that employ CSMA or opportunistic scheduling at the packet level, (ii) links shared with a primary user (iii) links that use fair rate sharing at the flow level. We consider a generic model so that our results give insight into the design and performance of media streaming over (a) wired networks with several paths between the source and destination, (b) wireless networks featuring spectrum aggregation and (c) multi-homed wireless networks.Comment: 24 page

arXiv.org e-Print Archive

HAL-CentraleSupelec

INRIA a CCSD electronic archive server

HAL-Rennes 1